253 research outputs found

    Evaluating openEHR for storing computable representations of electronic health record phenotyping algorithms

    Get PDF
    Electronic Health Records (EHR) are data generated during routine clinical care. EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the pace of precision medicine at scale. A main EHR use-case is creating phenotyping algorithms to define disease status, onset and severity. Currently, no common machine-readable standard exists for defining phenotyping algorithms which often are stored in human-readable formats. As a result, the translation of algorithms to implementation code is challenging and sharing across the scientific community is problematic. In this paper, we evaluate openEHR, a formal EHR data specification, for computable representations of EHR phenotyping algorithms.Comment: 30th IEEE International Symposium on Computer-Based Medical Systems - IEEE CBMS 201

    Neural-signature methods for structured EHR prediction

    Get PDF
    Models that can effectively represent structured Electronic Healthcare Records (EHR) are central to an increasing range of applications in healthcare. Due to the sequential nature of health data, Recurrent Neural Networks have emerged as the dominant component within state-of-the-art architectures. The signature transform represents an alternative modelling paradigm for sequential data. This transform provides a non-learnt approach to creating a fixed vector representation of temporal features and has shown strong performances across an increasing number of domains, including medical data. However, the signature method has not yet been applied to structured EHR data. To this end, we follow recent work that enables the signature to be used as a differentiable layer within a neural architecture enabling application in high dimensional domains where calculation would have previously been intractable. Using a heart failure prediction task as an exemplar, we provide an empirical evaluation of different variations of the signature method and compare against state-of-the-art baselines. This first application of neural-signature methods in real-world healthcare data shows a competitive performance when compared to strong baselines and thus warrants further investigation within the health domain

    Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms

    Get PDF
    Electronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured, semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses, prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority of algorithms are stored as human-readable descriptive text documents making their translation to code challenging due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate the two key Semantic Web Technologies, the Web Ontology Language and the Resource Description Framework, for enabling computable representations of EHR-driven phenotyping algorithms.Comment: Accepted American Medical Informatics Association Annual Symposium 201

    Association between clinical presentations before myocardial infarction and coronary mortality: a prospective population-based study using linked electronic records.

    Get PDF
    BACKGROUND: Ischaemia in different arterial territories before acute myocardial infarction (AMI) may influence post-AMI outcomes. No studies have evaluated prospectively collected information on ischaemia and its effect on short- and long-term coronary mortality. The objective of this study was to compare patients with and without prospectively measured ischaemic presentations before AMI in terms of infarct characteristics and coronary mortality. METHODS AND RESULTS: As part of the CALIBER programme, we linked data from primary care, hospital admissions, the national acute coronary syndrome registry and cause-specific mortality to identify patients with first AMI (n = 16,439). We analysed time from AMI to coronary mortality (n = 5283 deaths) using Cox regression (median 2.6 years follow-up), comparing patients with and without recent ischaemic presentations. Patients with ischaemic presentations in the 90 days before AMI experienced lower coronary mortality in the first 7 days after AMI compared with those with no prior ischaemic presentations, after adjusting for age, sex, smoking, diabetes, blood pressure and cardiovascular medications [HR: 0.64 (95% CI: 0.57-0.73) P < 0.001], but subsequent mortality was higher [HR: 1.42 (1.13-1.77) P = 0.001]. Patients with ischaemic presentations closer in time to AMI had the lowest seven day mortality (P-trend = 0.001). CONCLUSION: In the first large prospective study of ischaemic presentations prior to AMI, we have shown that those occurring closest to AMI are associated with lower short-term coronary mortality following AMI, which could represent a natural ischaemic preconditioning effect, observed in a clinical setting. CLINICAL TRIALS REGISTRATION: Clinicaltrials.gov identifier NCT01604486

    Diagnostic windows in non-neoplastic diseases: a systematic review

    Get PDF
    BACKGROUND: Investigating changes in prediagnostic healthcare utilisation can help identify how much earlier conditions could be diagnosed. Such 'diagnostic windows' are established for cancer but remain relatively unexplored for non-neoplastic conditions. AIM: To extract evidence on the presence and length of diagnostic windows for non-neoplastic conditions. DESIGN AND SETTING: A systematic review of studies of prediagnostic healthcare utilisation was carried out. METHOD: A search strategy was developed to identify relevant studies from PubMed and Connected Papers. Data were extracted on prediagnostic healthcare use, and evidence of diagnostic window presence and length was assessed. RESULTS: Of 4340 studies screened, 27 were included, covering 17 non-neoplastic conditions, including both chronic (for example, Parkinson's disease) and acute conditions (for example, stroke). Prediagnostic healthcare events included primary care encounters and presentations with relevant symptoms. For 10 conditions, sufficient evidence to determine diagnostic window presence and length was available, ranging from 28 days (herpes simplex encephalitis) to 9 years (ulcerative colitis). For the remaining conditions, diagnostic windows were likely to be present, but insufficient study duration was often a barrier to robustly determining their length, meaning that diagnostic window length may exceed 10 years for coeliac disease, for example. CONCLUSION: Evidence of changing healthcare use before diagnosis exists for many non-neoplastic conditions, establishing that early diagnosis is possible, in principle. In particular, some conditions may be detectable many years earlier than they are currently diagnosed. Further research is required to accurately estimate diagnostic windows and to determine how much earlier diagnosis may be possible, and how this might be achieved

    Diffsurv: Differentiable sorting for censored time-to-event data

    Full text link
    Survival analysis is a crucial semi-supervised task in machine learning with numerous real-world applications, particularly in healthcare. Currently, the most common approach to survival analysis is based on Cox's partial likelihood, which can be interpreted as a ranking model optimized on a lower bound of the concordance index. This relation between ranking models and Cox's partial likelihood considers only pairwise comparisons. Recent work has developed differentiable sorting methods which relax this pairwise independence assumption, enabling the ranking of sets of samples. However, current differentiable sorting methods cannot account for censoring, a key factor in many real-world datasets. To address this limitation, we propose a novel method called Diffsurv. We extend differentiable sorting methods to handle censored tasks by predicting matrices of possible permutations that take into account the label uncertainty introduced by censored samples. We contrast this approach with methods derived from partial likelihood and ranking losses. Our experiments show that Diffsurv outperforms established baselines in various simulated and real-world risk prediction scenarios. Additionally, we demonstrate the benefits of the algorithmic supervision enabled by Diffsurv by presenting a novel method for top-k risk prediction that outperforms current methods

    ClustEHR: a tool for generating synthetic EHR data for unsupervised learning experiments.

    Get PDF
    Objectives Clustering algorithms are commonly used to identify disease clusters. New Clustering algorithms are benchmarked on synthetic data to assess their accuracy. These datasets lack the complexities of real electronic health record (EHR) producing a partial assessment of the algorithm. We developed a synthetic EHR cluster generator for benchmarking clustering algorithms. Approach We have created a synthetic EHR cluster generator, clustEHR, based on Synthea, a synthetic EHR generator that produces datasets (with parameterized noise and cluster separation) of known clusters and with clinically relevant patient outcomes. We evaluated clustEHR by generating multiple datasets of variable cluster separation and percentage of noise variables to reflect easy and hard clustering problems. We used a linear model to assess the relationship between these parameters and cluster problem difficulty. K-means accuracy was used as a proxy to measure cluster problem difficulty. Results We have developed a tool for generating synthetic EHR cluster data with clinically relevant outcomes based on the rate of decline of medical observations (e.g. blood pressure). The following parameters are supported: a) number of clusters, b) number of patients in each cluster, c) number and data type of features, d) separation through defining clusters as either diseases such as COPD or dementia (high separability) or inter-disease conditions such as emphysema and chronic bronchitis within COPD (low separability), and e) noise variables through identifying variables not predictive of true cluster outcomes random forest feature importance metric. We show that high cluster separation significantly increases k-means accuracy (coefficient of 0.33). Smaller percent of noise variables increase accuracy though not significantly (coefficient 0.42). Conclusion ClustEHR offers realistic mixed data types as well as outcomes which are frequently used to evaluate clusters when subtyping diseases. The evaluating results suggest that the difficulty of the cluster data can be user determined. The tool can be used to create realistic datasets for evaluating clustering approaches

    Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

    Get PDF
    Electronic health records (EHR) are increasingly being used for constructing disease risk prediction models. Feature engineering in EHR data however is challenging due to their highly dimensional and heterogeneous nature. Low-dimensional representations of EHR data can potentially mitigate these challenges. In this paper, we use global vectors (GloVe) to learn word embeddings for diagnoses and procedures recorded using 13 million ontology terms across 2.7 million hospitalisations in national UK EHR. We demonstrate the utility of these embeddings by evaluating their performance in identifying patients which are at higher risk of being hospitalised for congestive heart failure. Our findings indicate that embeddings can enable the creation of robust EHR-derived disease risk prediction models and address some the limitations associated with manual clinical feature engineering.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.0721

    Exploring Hybrid Parallel Systems for Probabilistic Record Linkage

    Get PDF
    [EN] Record linkage is a technique widely used to gather data stored in disparate data sources that presumably pertain to the same real world entity. This integration can be done deterministically or probabilistically, depending on the existence of common key attributes among all data sources involved. The probabilistic approach is very time-consuming due to the amount of records that must be compared, specifically in big data scenarios. In this paper, we propose and evaluate a methodology that simultaneously exploits multicore and multi-GPU architectures in order to perform the probabilistic linkage of large-scale Brazilian governmental databases. We present some algorithmic optimizations that provide high accuracy and improve performance by defining the best algorithm-architecture combination for a problem given its input size. We also discuss performance results obtained with different data samples, showing that a hybrid approach outperforms other configurations, providing an average speedup of 7.9 when linking up to 20.000 million records.This work has been partially supported by CNPq, FAPESB, Bill & Melinda Gates Foundation, The Royal Society (UK), Medical Research Council (UK), NVIDIA Hardware Grant Program, Generalitat Valenciana (Grant PROMETEOII/2014/003), Spanish Government and European Commission through TEC2015-67387-C4-1-R (MINECO/FEDER), and network CAPAP-H. We have also worked in cooperation with the EU-COST Programme Action IC1305, "Network for Sustainable Ultrascale Computing (NESUS)Boratto, M.; Alonso-Jordá, P.; Pinto, C.; Melo, P.; Barreto, M.; Denaxas, S. (2019). Exploring Hybrid Parallel Systems for Probabilistic Record Linkage. The Journal of Supercomputing. 75:1137-1149. https://doi.org/10.1007/s11227-018-2328-3S1137114975Andrade G, Viegas F, Ramos GS, Almeida J, Rocha L, Gonçalves M, Ferreira R (2013) GPU-NB: a fast CUDA-based implementation of Naïve Bayes. In: 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp 168–175Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426Cook S (2013) CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs, 1st edn. Morgan Kaufmann, San FranciscoDoan A, Halevy A, Ives Z (2012) Principles of Data Integration. Elsevier, AmsterdamÉtienne EY (2012) Hyper-threading. TurbsPublishing, SaarbrückenFellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64:1183–1210Feng X, Jin H, Zheng R, Zhu L (2014) Near-duplicate detection using GPU-based simhash scheme. In: 2014 International Conference on Smart Computing, pp 223–228Forchhammer B, Papenbrock T, Stening T, Viehmeier S, Naumann U.D.F (2013) Duplicate detection on GPUs. In: BTW. Köllen-Verlag, pp 165–184Kim H.s, Lee D (2007) Parallel linkage. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007. ACM, New York, NY, USA, pp 283–292Mamun AA, Aseltine R, Rajasekaran S (2015) RLT-S: a web system for record linkage. PLoS ONE 10(5):1–9Mamun AA, Aseltine R, Rajasekaran S (2016) Efficient record linkage algorithms using complete linkage clustering. PLoS ONE 11(4):1–21Mamun AA, Mi T, Aseltine R, Rajasekaran S (2014) Efficient sequential and parallel algorithms for record linkage. J Am Med Inform Assoc 21(2):252–262Mizell E, Biery R (2017) How GPUs are defining the future of data analyticsMunshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011) OpenCL Programming Guide, 1st edn. Addison-Wesley, ReadingNVIDIA Corporation: NVIDIA CUDA C programming guide (2010). Version 3.2OpenMP Architecture Review Board: OpenMP application program interface version 4.0 (2013)Pokorny J (2011) NoSQL databases: a step to database scalability in web environment. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiWAS ’11. ACM, New York, NY, USA, pp 278–283Rendle S, Schmidt-Thieme L (2008) Scaling Record Linkage to Non-uniform Distributed Class Sizes. Springer, Berlin, pp 308–319Sehili Z, Kolb L, Borgs C, Schnell R, Rahm E (2015) Privacy preserving record linkage with ppjoin. In: Datenbanksysteme für Business, Technologie und Web (BTW), pp 85–104Winkler WE (1999) The state of record linkage and current research problemsZhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64(9):2506–251
    corecore